home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
TPUG - Toronto PET Users Group
/
TPUG Users Group CD
/
TPUG Users Group CD.iso
/
PET
/
S-Super PET
/
(s)t5.d64
/
EDA.OVERVIEW.TXT
< prev
next >
Wrap
Text File
|
2009-01-18
|
13KB
|
448 lines
SPUG/APL: EDA - 1 - May 1983
AN EXPLORATORY DATA ANALYSIS PACKAGE
FOR THE COMMODORE SUPERPET
AUTHORS:
M.P. McFarlane Head, Centre of Applied Research Methodology,
Darling Downs Institute of Advanced Education,
Queensland, Australia.
D.R. McNeil Professor of Statistics, Macquarie University.
INTRODUCTION
Exploratory Data Analysis (EDA) focuses attention upon detecting and
examining patterns in data. It is concerned with observing, question-
ing, posing models and explanations, and confronting relationships, in
the quest for understanding.
This approach to data analysis requires tools and aids which are easy
to use and which assist the researcher to concentrate on the primary
tasks of analysis. This EDA package is designed to do just that. It is
designed to enable researchers, teachers, students and practitioners to
easily make use of a number of exploratory strategies developed by John
Tukey.
In particular, the package has been specificially designed to comple-
ment a book by one of the authors, Donald R. McNeil, entitled: "Inter-
active Data Analysis: A Practical Primer", Wiley-Interscience, John
Wiley & Sons, Inc., New York, 1977 and is based upon the APL listings
outlined in that book.
One other prominent textbook in this area, written by John W. Tukey of
Princeton University, is: "Exploratory Data Analysis", Addison-Wesley
Publishing Co., 1977.
ACKNOWLEDGEMENT
The authors of the EDA package wish to acknowledge the generous support
of Commodore Australia in the provision of equipment.
The EDA disk came to SPUG from Australia via Waterloo. In order to use
disk space more efficiently, it has been reorganized substantially by
Steve Zeller (see Appendix).
SPUG/APL: EDA - 2 - May 1983
THE EDA PACKAGE
I. GENERAL DESCRIPTION
A. FUNCTIONAL RELATIONSHIPS
Twelve APL function groupings are used for data analysis.
They are:
1. STEMLEAF - Stem and Leaf Displays
2. BOXPLOT - Boxplots
3. CONDENSE - Numbered summaries of data sets
4. SCAT - Scatter Plots
5. LINE - Regresion Line (Univariate)
6. COMPARE - Multiple Box Plots
7. MEDPOLISH - Median Polish
8. CTABLE - Coded Tables
9. SMOOTH - Smoothing
10. CENTER - Center Estimates of a Batch
11. REGRESS - Robust Regression
12. ADDFIT - Analysis of Two-Way Tables
Initially, the "main" APL workspace is empty. The needed
functions are then selected via a menu and read into the
workspace (which then can be saved under an alternative name,
if desired).
An APL text dataset is available for each of these EDA
functions.
B. DATA SETS
Seven sample datasets are supplied by the authors. They are:
INSECTS, PHONES, AIRMILES, ACIDS, CRIMES, LOBSTERS, and
DEATHS.
An APL sequential dataset is available for each of these.
SPUG/APL: EDA - 3 - May 1983
C. GROUPED FUNCTIONS
The twelve data analysis packages listed above can be
reorganised into six groups; following the text by McNeil:
1. DISPLAYS (STEMLEAF, BOXPLOT & CONDENSE)
2. COMPARISONS (COMPARE & CONDENSE)
3. RELATIONS (SCAT & LINE)
4. TABLES (MEDPOLISH, SCAT, CONDENSE & CTABLE)
5. SMOOTHING (SMOOTH, SCAT, LINE & CONDENSE)
6. FITTING (CENTER & REGRESS)
Such organization is the responsibility of the user.
D. PARAMETERS
Twenty One parameters are used by the various EDA functions.
For convenience, these are all included in the "main"
workspace. Their actual use in the twelve data analysis areas
is detailed in the Appendix. Since most parameters are
singlets, they do not take up a great deal of storage.
E. FILE NAMING CONVENTIONS
The original disk consisted of 39 workspaces. This resulted
in an extremely inefficient use of disk space and resulted in
slow access times to functions in other workspaces. The re-
organized disk consists of: (a) textfiles of information,
largely in APL, (b) APL functions stored individually as APL
sequential files, and (c) an APL workspace containing
utilties to draw in the relevant EDA APL functions and
establish them in the workspace.
All files are prefixed with "EDA." and end in either ".TXT"
(for ASCII text files), ".INF" (for APL text files), ".AFN"
(for APL function stored as a character matrix in an APL
sequential file), or ".AWS" (for an APL workspace). The
remaining either characters in the file name are used to
identify the nature of the particular file's contents.
SPUG/APL: EDA - 4 - May 1983
III. USE OF THE EDA PACKAGE
The authors have assumed a basic proficiency in APL. Users lacking
this background are invited to work through the tutorial exercises
in the SuperPET microAPL manual. Before you begin using this disk,
REMEMBER TO MAKE A BACKUP COPY!
A. GETTING STARTED
1. Load the APL interpreter into the SuperPET from the main
menu;
2. Place the EDA disk into either drive 0 or drive 1 and
establish the workspace ID as:
)WSID DISK/0.EDA.MAIN.AWS (for drive 0), or
)WSID DISK/1.EDA.MAIN.AWS (for drive 1)
3. The SuperPET will respond with: WAS CLEAR WS.
4. Load the main EDA APL workspace with:
)LOAD
and the main workspace will be loaded into RAM.
5. This workspace does not contain any EDA analysis
capabilites (Hit shift-"3" on the numeric pad to see
the contents of the WS). The APL functions present are
used only to pull in the desired EDA functions. They are
all prefixed by the APL symbol "delta".
The first step is to make sure the software knows which
disk contains the APL functions. Look at the variable
"disk" (preceded by delta). It should be either
'DISK/0.' or 'DISK/1.' and should correspond to the disk
you are now using for EDA (Note: the trailing period is
important). If it doesn't match, change its value using
APL assignment of a character vector.
6. Now invoke the utilities by typing:
EDA
You will be shown a menu of the twelve EDA functional
groupings listed above. Upon the selection of one, you
will be given the opportunity to see the documentation
that accompanies this EDA function and then the relevant
APL functions will be drawn in. You can add other analy-
sis capabilities to the workspace by returning to the
menu.
SPUG/APL: EDA - 5 - May 1983
7. At this point, you are now ready to begin exploratory
data analysis. Example data sets are available on disk
and can be pulled in with: GETDATA.
Most of the utilities in the WS can be removed by
typing:
CLEARDELTAS
8. One last note. Before you inadvertently save this
workspace, change its name using the )WSID. I find it
useful to have the EDA disk in drive 1 and then save
workspaces that I have created on drive 0.
B. PROBLEMS
The most likely problem will be one of running out of space
(the dreaded message: WS FULL). The only recourse is to
delete unneeded functions and data. If that still does not
resolve the problem, then the particular problem is too big
for the SuperPET.
If you require any additional help with this package, write
Steve Zeller and give as much specific information about your
problem as you can think of.
SPUG/APL - 6 - May 1983
APPENDIX
INTRODUCTION
The original disk from Australia came with 40 APL workspaces taking up
virtually the entire side of an 8050 floppy disk. Each APL function,
for example, appeared in at least three workspaces. This repetitive
approach, of course, results in very inefficient use of disk space.
This is a problem for SPUG users who have 4040 drives and it means that
other related APL workspaces cannot be placed on the EDA disk. In ad-
dition, copying functions and/or data from other workspaces is a very
slow way to use the disk. To address these problems, the SPUG version
of the EDA disk has been reorganized substantially. This new organiza-
tion frees up a substantial amount of disk space but still provides
users with all the EDA functions and data.
The following table lists the basic APL functions down the left-hand
stub and indicates which functions are used in each of the twelve EDA
functions. These functions are brought into the EDA workspace automa-
tically as needed (using the EDA main menu). Each function is stored
individually on the EDA/SPUG disk as an APL character matrix which is
then established in the workspace via the <quad>FX system function. The
file name of each function is prefixed by 'EDA.' and ends with '.AFN'.
The sample datasets are also stored individually on disk. Their file-
names end with '.ADT'. Under this scheme, storage requirements are re-
duced. Furthermore, retreiving functions and data stored in this
fashion is faster than with the )COPY command.
SPUG/APL - 7 - May 1983
TABLE 1: EDA/APL Function Relationships
EDA FUNCTION
:-----------------------------------
: s b c s l c m c s c r a
: t o o c i o e t m e e d
: e x n a n m d a o n g d
: m p d t e p p b o t r f
: l l e a o l t e e i
: e o n r l e h r s t
: a t s e i s
APL : f e s
FUNCTION: h
--------------------------------------------
stemleaf X
rownames X X X X
boxplot X
fill X X
condense X X
dscat X
scat X
line X
compare X
medpol X
ctable X
a3r X
b3r X
smooth3r X
split X
smoothr X
center X
regress X
addfit X
----------------------------------------------